SHAPIRO
Overview
The SHAPIRO function performs the Shapiro-Wilk test, a widely used statistical test for assessing whether a sample of data was drawn from a normal distribution. This test is particularly valued for its statistical power, especially with small to moderate sample sizes, making it a preferred choice for normality testing in many applications.
The Shapiro-Wilk test was introduced by Samuel Shapiro and Martin Wilk in their 1965 paper “An analysis of variance test for normality” published in Biometrika. The test computes a W statistic that measures how well the ordered sample values correspond to the expected values from a normal distribution. Values of W close to 1 indicate normality, while significantly lower values suggest the data deviates from a normal distribution.
The test statistic is computed as:
W = \frac{\left(\sum_{i=1}^{n} a_i x_{(i)}\right)^2}{\sum_{i=1}^{n}(x_i - \bar{x})^2}
where x_{(i)} are the ordered sample values, \bar{x} is the sample mean, and a_i are coefficients derived from the expected values and covariance matrix of order statistics from a standard normal distribution.
This implementation uses the SciPy library’s scipy.stats.shapiro function, which is based on the algorithm described by Royston (1995). For detailed documentation, see the SciPy shapiro reference. The function requires at least three data points and returns both the W test statistic and a p-value. A small p-value (typically < 0.05) indicates evidence against the null hypothesis that the data is normally distributed.
Note that for sample sizes greater than 5,000, the W statistic remains accurate, but the p-value may be less reliable. For alternative normality tests, consider the Anderson-Darling or Kolmogorov-Smirnov tests available in SciPy.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=SHAPIRO(data)
data(list[list], required): 2D array of numeric sample data. Must contain at least three elements.
Returns (list[list]): 2D list [[statistic, p_value]], or error message string.
Examples
Example 1: Normally distributed data
Inputs:
| data |
|---|
| 1.2 |
| 2.3 |
| 1.8 |
| 2.1 |
| 1.7 |
Excel formula:
=SHAPIRO({1.2;2.3;1.8;2.1;1.7})
Expected output:
| Result | |
|---|---|
| 0.9675 | 0.8591 |
Example 2: Non-normal data (bimodal)
Inputs:
| data |
|---|
| 1 |
| 5 |
| 1.1 |
| 5.1 |
| 1.2 |
| 5.2 |
Excel formula:
=SHAPIRO({1;5;1.1;5.1;1.2;5.2})
Expected output:
| Result | |
|---|---|
| 0.7194 | 0.0098 |
Example 3: Small sample (minimum size)
Inputs:
| data |
|---|
| 2 |
| 2.1 |
| 2.2 |
Excel formula:
=SHAPIRO({2;2.1;2.2})
Expected output:
| Result | |
|---|---|
| 1 | 1 |
Example 4: Uniformly spaced data
Inputs:
| data |
|---|
| 0.1 |
| 0.4 |
| 0.7 |
| 1 |
| 1.3 |
| 1.6 |
Excel formula:
=SHAPIRO({0.1;0.4;0.7;1;1.3;1.6})
Expected output:
| Result | |
|---|---|
| 0.9819 | 0.9606 |
Python Code
import math
from scipy.stats import shapiro as scipy_shapiro
def shapiro(data):
"""
Perform the Shapiro-Wilk test for normality.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric sample data. Must contain at least three elements.
Returns:
list[list]: 2D list [[statistic, p_value]], or error message string.
"""
def to2d(x):
return [[x]] if not isinstance(x, list) else x
data = to2d(data)
if not isinstance(data, list) or not all(isinstance(row, list) for row in data):
return "Invalid input: data must be a 2D list."
# Flatten 2D list to 1D
flat = []
for row in data:
for x in row:
try:
flat.append(float(x))
except (TypeError, ValueError):
return "Invalid input: data must contain numeric values."
if len(flat) < 3:
return "Invalid input: data must contain at least three numeric values."
try:
stat, p = scipy_shapiro(flat)
except Exception as e:
return f"scipy.stats.shapiro error: {e}"
# Check for nan/inf
if math.isnan(stat) or math.isnan(p) or math.isinf(stat) or math.isinf(p):
return "Error: Output is nan or inf."
return [[float(stat), float(p)]]